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Abstract 

An  open  system  can  be  modeled  as  a  two-player  game  be¬ 
tween  the  system  and  its  environment.  At  each  round  of  the 
game,  player  1  ( the  system )  and  player  2  ( the  environment) 
independently  and  simultaneously  choose  moves,  and  the 
two  choices  determine  the  next  state  of  the  game.  Proper¬ 
ties  of  open  systems  can  be  modeled  as  objectives  of  these 
two-player  games.  For  the  basic  objective  of  reachabil¬ 
ity  — can  player  1  force  the  game  to  a  given  set  of  target 
states ? —  there  are  three  types  of  winning  states,  according 
to  the  degree  of  certainty  with  which  player  1  can  reach 
the  target.  From  type-1  states,  player  1  has  a  deterministic 
strategy  to  always  reach  the  target.  From  type-2  states, 
player  1  has  a  randomized  strategy  to  reach  the  target  with 
probability  1.  From  type-3  states,  player  1  has  for  every 
real  e  >  0  a  randomized  strategy  to  reach  the  target  with 
probability  greater  than  1  —  s. 

We  show  that  for  finite  state  spaces,  all  three  sets  of 
winning  states  can  be  computed  in  polynomial  time:  type- 
1  states  in  linear  time,  and  type-2  and  type-3  states  in 
quadratic  time.  The  algorithms  to  compute  the  three  sets 
of  winning  states  also  enable  the  construction  of  the  win¬ 
ning  and  spoiling  strategies.  Finally,  we  apply  our  results 
by  introducing  a  temporal  logic  in  which  all  three  kinds 
of  winning  conditions  can  be  specified,  and  which  can  be 
model  checked  in  polynomial  time.  This  logic,  called  Ran¬ 
domized  ATL,  is  suitable  for  reasoning  about  randomized 
behavior  in  open  ( two-agent )  as  well  as  multi-agent  sys¬ 
tems. 
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1.  Introduction 

One  of  the  central  problems  in  system  verification  is  the 
reachability  question:  given  an  initial  state  s  and  a  target 
state  t ,  can  the  system  get  from  s  to  /?  The  dynamics  of 
a  closed  system,  which  does  not  interact  with  its  environ¬ 
ment,  can  be  modeled  by  a  state-transition  graph,  and  the 
reachability  question  reduces  to  graph  reachability,  which 
can  be  solved  in  linear  time  and  is  complete  for  NLOGSPACE 
[Jon75],  By  contrast  the  dynamics  of  an  open  system, 
which  does  interact  with  its  environment,  is  best  modeled 
as  a  game  between  the  system  and  the  environment. 

In  some  situations,  it  may  suffice  to  have  the  system 
and  the  environment  take  turns  to  make  moves,  yielding  a 
turn-based  model.  In  this  case,  the  game  graph  is  an  And- 
Or  graph.  A  (deterministic)  strategy  for  the  And  player 
maps  every  path  that  ends  in  an  And  state  to  a  successor 
state,  and  similarly  for  the  Or  player.  Thus  the  reacha¬ 
bility  question  (can  the  system  get  from  s  to  t  no  matter 
what  the  environment  does?)  reduces  to  And-Or  graph 
reachability  (does  the  Or  player  have  a  strategy  so  that  for 
all  strategies  of  the  And  player,  the  game,  if  started  in  .s, 
reaches  tl).  This  problem  can  again  be  solved  in  linear 
time  and  is  complete  for  Ptime  [Imm81].  With  respect  to 
And-Or  graph  reachability,  randomized  strategies  are  not 
more  powerful  than  deterministic  strategies.  A  randomized 
strategy  for  the  And  player  maps  every  path  that  ends  in 
an  And  state  to  a  probability  distribution  on  the  succes¬ 
sor  states,  and  similarly  for  the  Or  player.  In  turn-based 
models,  it  can  be  seen  that  the  And-Or  graph  reachability 
question  has  the  same  answer  as  the  probabilistic  question 
“does  the  Or  player  have  a  randomized  strategy  so  that  for 
all  randomized  strategies  of  the  And  player,  the  game,  if 
started  in  s,  reaches  t  with  probability  1  ?”. 

The  turn-based  model  is  naive,  because  in  realistic  con¬ 
currency  models,  in  each  state,  the  system  and  the  en¬ 
vironment  independently  choose  moves,  and  the  parallel 
execution  of  the  moves  determines  the  next  state.  Such 
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throwR,  standR 
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Figure  1 .  Game  left-or-right. 

a  simultaneous  game  is  a  natural  model  for  synchronous 
systems  where  the  moves  are  chosen  truly  simultaneously, 
as  well  as  for  distributed  systems  in  which  the  moves  are 
not  revealed  until  their  combined  effect  is  apparent.  In  par¬ 
ticular,  the  modeling  of  synchronization  between  processes 
often  requires  the  consideration  of  simultaneous  games. 

The  simultaneous  case  is  more  general  than  the  turn- 
based  one,  and  deterministic  strategies  no  longer  tell  the 
whole  story  about  the  reachability  question.  The  fact  that 
randomized  strategies  can  be  more  powerful  than  deter¬ 
ministic  ones  is  illustrated  by  the  game  LEFT-OR-RIGHT, 
depicted  in  Figure  1.  Initially,  the  game  is  at  state  t  throw- 
At  each  round,  player  1  can  choose  to  throw  a  snowball 
either  at  the  left  window  (move  throwL)  or  at  the  right  win¬ 
dow  (move  throwR).  Independently  and  simultaneously, 
player  2  must  choose  to  stand  behind  either  the  left  window 
(move  standL)  or  the  right  window  (move  standR).  If  the 
snowball  hits  player  2,  the  game  proceeds  to  the  target  state 
thu ;  otherwise,  another  round  of  the  game  is  played  from 

t  throw- 

For  each  move  of  player  1,  player  2  has  a  countermea¬ 
sure.  If  we  consider  only  deterministic  strategies,  then  for 
every  strategy  of  player  1,  there  is  (exactly  one)  strategy 
of  player  2  such  that  is  never  reached.  Hence,  if  we 
base  our  definitions  on  deterministic  strategies,  we  obtain 
the  answer  No  to  the  reachability  question.  The  situation 
of  player  2,  however,  is  not  nearly  as  safe  as  this  negative 
answer  implies.  If  player  1  chooses  at  each  round  the  win¬ 
dow  at  which  to  throw  the  snowball  by  tossing  a  coin,  then 
player  2  will  be  hit  with  probability  1,  regardless  of  her 
strategy. 

The  coin-tossing  criterion  used  by  player  1  to  select  the 
move  is  an  example  of  a  randomized  strategy,  and  the  game 
illustrates  the  value  of  randomized  strategies  for  winning 
reachability  games.  If  player  1  adopts  a  deterministic  strat¬ 


egy,  the  moves  he  plays  during  the  game  are  completely 
determined  by  the  history  of  the  game,  which  is  visible  also 
to  player  2.  Once  player  1  has  chosen  a  deterministic  strat¬ 
egy,  player  2  can  choose  her  strategy  to  counteract  every 
move  of  player  1,  as  if  she  were  able  to  see  it  before  choos¬ 
ing  her  own  move.  Randomized  strategies  postpone  the 
choice  of  the  move  until  the  game  is  being  played,  preclud¬ 
ing  this  type  of  spying  behavior.  Another  way  of  thinking 
about  randomized  strategies  is  through  the  concept  of  ini¬ 
tial  randomization.  The  choice  of  a  randomized  strategy  is 
equivalent  to  the  choice  of  a  probability  distribution  over 
the  set  of  deterministic  strategies  [Der70].  By  choosing 
such  a  distribution,  rather  than  a  single  strategy,  player  1 
prevents  player  2  from  tailoring  her  strategy  to  counteract 
the  strategy  chosen  by  player  1.  The  greater  power  of  ran¬ 
domized  strategies  is  a  well-known  fact  in  game  theory,  and 
it  has  its  roots  in  von  Neumann’s  minimax  theorem  [vN28], 
Once  we  consider  randomized  strategies,  we  can  answer 
the  reachability  question  with  three  kinds  of  affirmative 
answers.  The  first  kind  of  answer  is  the  answer  sure: 

Player  1  has  a  strategy  so  that  for  all  strategies 
of  player  2,  the  game,  if  started  in  s,  always 
reaches  t. 

To  establish  this  type  of  answer,  it  suffices  to  consider 
deterministic  strategies  only.  The  second,  weaker  kind  of 
answer  is  the  answer  almost  sure: 

Player  1  has  a  strategy  so  that  for  all  strategies 
of  player  2,  the  game,  if  started  in  s,  reaches  / 
with  probability  1 . 

To  establish  this  type  of  answer,  it  is  necessary  to  consider 
randomized  strategies,  as  previously  discussed.  The  third, 
yet  weaker  kind  of  answer  is  the  answer  limit  sure: 

For  every  real  e  >  0,  player  1  has  a  strategy  so 
that  for  all  strategies  of  player  2,  the  game,  if 
started  in  s,  reaches  t  with  probability  greater 
than  1  —  s. 

The  three  kinds  of  answers  form  a  proper  hierarchy,  in 
the  sense  that  there  are  cases  in  which  almost-sure  reach¬ 
ability  holds  whereas  sure  reachability  does  not,  and  cases 
in  which  limit-sure  reachability  holds  whereas  almost-sure 
reachability  does  not.  Note  that  the  second  gap  does  not 
appear  in  reachability  problems  over  Markov  chains,  or 
Markov  decision  processes  [KSK66,  BT91].  While  the 
game  LEFT-OR-RIGHT  witnesses  the  first  gap,  the  second 
gap  is  witnessed  by  the  game  HIDE-OR-RUN,  adapted  from 
[KS81]  and  depicted  in  Figure  2.  The  target  state  is  .s/,0,„c, 
and  the  interesting  part  of  the  game  happens  at  state 
At  this  state,  player  1  is  hiding  behind  a  small  hill,  while 
player  2  is  trying  to  hit  him  with  a  snowball.  Player  1 
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Figure  2.  Game  hide-or-run. 

can  choose  between  hiding  or  running,  and  player  2  can 
choose  between  waiting  and  throwing  her  only  snowball. 
If  player  1  runs  and  player  2  throws  the  snowball,  then 
player  2  is  hit,  and  the  game  proceeds  to  state  swet.  If 
player  1  runs  and  player  2  waits,  then  player  1  gets  home, 
and  the  game  proceeds  to  state  Sh0me.  If  player  1  hides  and 
player  2  throws  the  snowball,  then  player  1  is  no  more  in 
danger,  and  the  game  proceeds  to  state  ssafe.  Finally,  if 
player  1  hides  and  player  2  waits,  the  game  stays  at  s/,;*- 
In  this  game,  from  state  player  1  does  not  have  a 
strategy  (randomized  or  deterministic)  that  ensures  reach¬ 
ing  Sh ome  with  probability  1:  in  order  to  reach  home  re¬ 
gardless  of  the  strategy  of  player  2,  player  1  may  have  to 
take  a  chance  and  run  while  player  2  is  still  in  possession  of 
the  snowball.  On  the  other  hand,  if  player  1  runs  with  very 
small  probability  at  each  round,  it  becomes  very  difficult  for 
player  2  to  time  her  snowball  to  coincide  with  the  running 
of  player  1  — and  a  badly  timed  snowball  enables  player  1 
to  reach  sj,ome.  Thus,  if  player  1  runs  at  each  round  with 
probability  p,  when  p  goes  to  0,  he  is  able  to  reach  Shome 
with  probability  approaching  1  [KS81],  Hence,  the  answer 
to  the  reachability  question  is  limit  sure  but  not  almost  sure. 

In  this  paper,  we  study  simultaneous  reachability  games, 
and  we  consider  strategies  for  the  players  that  can  be  both 
randomized  and  history-dependent.  We  focus  on  deter¬ 
ministic  games,  in  which  the  current  state  and  the  players’ 
moves  uniquely  determine  the  successor  state;  the  more 
general  case  of  probabilistic  games,  in  which  the  successor 
state  is  chosen  according  to  a  probability  distribution,  is 
similar,  and  has  been  described  in  [dAHK98], 

The  contributions  of  the  paper  are  as  follows.  First, 
we  provide  efficient  algorithms  that,  given  a  finite  simul¬ 


taneous  game  and  a  set  R  of  target  states,  determine  the 
sets  Sure(R),  Almost(R),  and  Limit(R)  of  initial  states  for 
which  the  answer  to  the  reachability  question  is  sure,  almost 
sure,  and  limit  sure,  respectively.  The  set  Sure(R)  can  be 
determined  in  linear  time  [TW68,  Bee80],  By  contrast,  the 
determination  of  the  sets  Almost(R)  and  Limit(R)  requires 
quadratic  time.  All  three  algorithms  are  formulated  as 
nested  fixed-point  computations,  and  can  be  implemented 
using  symbolic  state-space  traversal  methods  [BCM+92], 
Our  algorithms  also  enable  the  effective  construction  of 
winning  strategies  for  player  1,  and  spoiling  strategies  for 
player  2,  for  the  three  types  of  answers.  The  correctness 
proofs  for  the  algorithms,  and  for  all  results  of  the  paper, 
can  be  found  in  [dAHK98 1. 

Second,  we  characterize  the  three  kinds  of  reachabil¬ 
ity  in  terms  of  the  time  (i.e.,  the  number  of  rounds)  re¬ 
quired  to  reach  a  target  state,  and  in  terms  of  the  types  of 
winning  and  spoiling  strategies  available  to  the  two  play¬ 
ers.  In  particular,  while  the  time  to  target  is  bounded  from 
Sure(R),  only  the  expected  time  to  target  can  be  bounded 
from  Almost(R)  \  Sure(R).  From  Limit(R)  \Almost(R), 
neither  the  time  to  target  nor  the  expected  time  to  target 
are  bounded.  We  also  show  that  the  spoiling  strategies 
for  almost-sure  reachability  must  in  general  have  infinite 
memory,  in  contrast  with  the  situation  for  Markov  decision 
processes  [Der70]  and  for  limit-sure  reachability  [KS81], 

Third,  we  introduce  a  temporal  logic  for  the  specifica¬ 
tion  of  open  systems,  which  can  be  used  both  for  two-agent 
systems  (system  vs.  environment)  and  for  more  general, 
multi-agent  systems.  The  logic,  called  Randomized  ATL 
(RATL),  is  an  extension  of  the  logic  Alternating  Temporal 
Logic  of  [AHK97],  Both  logics  let  us  specify  that  a  set  of 
agents  has  strategies  to  ensure  that  the  paths  of  the  global 
system  satisfy  given  temporal  properties.  The  logic  ATL 
considers  only  deterministic  strategies;  hence  its  semantics 
is  defined  on  the  basis  of  the  sure  answer  for  reachability 
questions.  The  logic  RATL  considers  instead  randomized 
strategies,  and  it  distinguishes  between  three  kinds  of  sat¬ 
isfaction  for  path  properties:  sure  satisfaction  (as  in  ATL), 
almost-sure  satisfaction,  and  limit-sure  satisfaction.  The 
proper  hierarchy  between  sure,  almost-sure,  and  limit-sure 
reachability  implies  a  proper  hierarchy  for  the  three  kinds 
of  satisfaction.  We  show  that  this  hierarchy  collapses  in  the 
special  case  of  safety  properties,  such  as  invariance.  Our 
algorithms  for  solving  the  reachability  question  for  simul¬ 
taneous  games  lead  to  a  symbolic,  quadratic-time  model¬ 
checking  algorithm  for  RATL. 

Related  work.  Polynomial-time  algorithms  to  compute 
the  sets  Almost(R)  and  Limit(R)  are  already  known  for 
one -player  games  and  for  turn-based  games. 

A  one-player  game  is  a  game  in  which  only  one  player 
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can  choose  among  more  than  one  move.  While  determinis¬ 
tic  one -player  games  are  equivalent  to  graphs,  and  are  thus 
easily  analyzed,  probabilistic  one-player  games  are  equiv¬ 
alent  to  Markov  decision  processes.  In  Markov  decision 
processes,  standard  arguments  concerning  the  existence 
of  optimal  strategies  show  that  Almost(R)  —  Limit(R); 
moreover,  this  set  can  be  computed  in  polynomial  time  by 
a  reduction  to  linear  programming  [Der70],  If  the  only 
player  is  player  1,  who  attempts  to  reach  set  R,  we  can  also 
compute  A Imost(R)  using  the  polynomial-time  algorithms 
independently  proposed  by  [dA97,  Var95],  which  avoid  the 
use  of  linear  programming.  If  the  only  player  is  player  2,  to 
compute  Almost(R)  it  suffices  to  compute  the  set  of  states 
of  a  Markov  decision  process  from  which  R  is  reached  with 
probability  1  under  all  strategies.  This  can  be  done  with 
the  polynomial-time  algorithms  of  [HSP83,  Var85,  CY88], 
which  again  avoid  the  use  of  linear  programming. 

In  deterministic  turn-based  games  the  three  types  of  win¬ 
ning  states  coincide;  that  is,  Sure(R)  =  Almost(R)  = 
Limit(R).  The  problem  of  computing  these  sets  is  equiv¬ 
alent  to  the  previously  mentioned  And-Or  reachability 
problem,  and  the  existence  of  memoryless  deterministic 
winning  and  spoiling  strategies  follows  from  an  analysis  of 
the  algorithms.  In  probabilistic  turn-based  games,  we  have 
Almost(R)  =  Limit(R),  and  these  sets  can  be  computed  in 
polynomial  time  [Yan98],  The  problem  of  computing  these 
sets  can  also  be  reduced  to  the  one  of  solving  switching- 
controller  undiscounted  games,  but  this  reduction  does  not 
yield  a  polynomial-time  algorithm  [VTRF83a],  The  prob¬ 
lem  of  deciding  which  player  has  the  greatest  probability 
of  winning  is  in  NP  fl  CO-NP  [Con92], 

For  general  reachability  games  with  finite  sets  of  states 
and  actions,  [KS81]  shows  the  existence  of  memoryless  e- 
optimal  strategies  for  both  players.  While  these  results 
imply  the  existence  of  memoryless  winning  and  spoil¬ 
ing  strategies  for  limit-sure  reachability,  they  do  not  pro¬ 
vide  methods  for  the  effective  construction  of  these  strate¬ 
gies.  The  maximal  probability  with  which  player  1  can 
force  a  visit  to  R  can  be  computed  with  a  successive- 
approximation  method  proposed  for  total-reward  stochastic 
games  [TV87,  FV97],  However,  we  are  not  aware  of  previ¬ 
ous  criteria  for  efficiently  deciding  whether  the  sequence  of 
approximations  converges  to  1.  Surveys  of  algorithms  for 
general  stochastic  games  can  be  found  in  [RF91,  FV97], 

2.  Reachability  Games 

A  (two-player)  game  structure  G  =  { S ,  Moves ,  Ti,  T2,  p) 
consists  of  the  following  components: 

•  A  finite  state  space  S. 

•  A  finite  set  Moves  of  moves. 


•  Two  move  assignments  Ti ,  T2:  S  i— *  2^oves  \  0 .  For 
i  £  {1, 2},  assignment  T;  associates  with  each  state 
s  £  S  the  nonempty  set  r,;(.s)  C  Moves  of  moves 
available  to  player  i  at  state  s.  For  technical  con¬ 
venience,  we  assume  that  0  Tj  (I)  =  0  unless 
i  =  j  and  s  =  t,  for  all  i,  j  £  {1,2}  and  s,t  £  S,  so 
that  all  moves  are  distinct. 

•  A  transition  function  6  :  S  x  Moves  x  Moves  — 
S ,  which  associates  with  every  state  s  £  S  and  all 
moves  a i  £  ri(s)  and  a.2  £  r2(s)  a  successor  state 
6(s,  a i,  a.2)  £  S. 

At  every  state  s  £  S,  player  1  chooses  a  move  a\  £  ri(s), 
and  simultaneously  and  independently  player  2  chooses  a 
move  a 2  G  ^(s);  the  game  then  proceeds  to  the  successor 
state  8(s,  a  1,  02).  A  path  of  G  is  an  infinite  sequence  s  = 
s0,st,s2, ...  of  states  in  S  such  that  for  all  k  >  0,  there 
are  moves  a\  £  r^s*)  and  of  G  T^s*)  such  that  = 
6(sfa  dj ,  af).  We  denote  by  fl  the  set  of  all  paths. 

A  reachability  game  (or  game,  for  short)  Q  =  ( G ,  R) 
consists  of  a  game  structure  G  and  a  set  R  C  S  of  target 
states;  the  set  R  itself  is  called  the  target  set.  The  goal 
of  player  1  in  the  game  Q  is  to  reach  a  state  in  the  target 
set  R,  and  the  goal  of  player  2  is  to  prevent  this.  Thus, 
a  reachability  game  is  a  special  case  of  a  recursive  game 
[Eve57], 

We  say  that  a  game  structure  G  is  turn-based  if  at  every 
state  at  most  one  player  can  choose  among  multiple  moves; 
that  is,  for  every  state  s  £  S  there  exists  at  most  one 
i  £  {1,2}  with  |Fi(»)|  >  1. 

In  the  following,  we  consider  a  game  Q  = 
{{ S ,  Moves ,  T| .  I ' 2 ,  p),  R),  unless  otherwise  noted.  To  sim¬ 
plify  the  presentation  of  the  results,  we  assume  that  the 
target  set  R  is  absorbing ;  that  is,  we  assume  that  for  all 
s  £  R  and  all  cq  £  ri(s)  and  <22  £  ^(s),  we  have 
8{s,a  1,02)  £  R.  If  R  is  not  absorbing,  it  is  trivial  to 
obtain  an  equivalent  game  with  an  absorbing  target  set.  We 
define  the  size  of  the  game  Q  to  be  equal  to  the  num¬ 
ber  of  entries  of  the  transition  function  8\  specifically, 
I6|  =  £ses.  |r1(s)||r2(s)|. 

2.1.  Strategies 

For  a  finite  set  A,  a  probability  distribution  on  A  is  a 
function  p  :  A  1— ►  [0,1]  such  that  £oe^p(a)  =  1. 
We  denote  the  set  of  probability  distributions  on  A  by 
V(A).  A  strategy  for  player  i  £  {1,2}  is  a  mapping 
iTi :  S+  V (Moves)  that  associates  with  every  nonempty 
finite  sequence  a  £  S+  of  states,  representing  the  past  his¬ 
tory  of  the  game,  a  probability  distribution  TTi(a),  used 
to  select  the  next  move.  Thus,  the  choice  of  the  next 
move  can  be  history-dependent  and  randomized.  The 
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strategy  tt,-  can  prescribe  only  moves  that  are  available  to 
player  i:  for  all  sequences  a  G  S'  and  states  s  E  S,  we 
require  that  if  7Tj(as)(a)  >  0,  then  a  E  We  de¬ 

note  by  I!;  the  set  of  all  strategies  for  player  i  E  {1,2}. 
Given  a  state  s  E  S  and  two  strategies  ir\  E  II  i  and 
5J"2  G  n2.  we  define  Outcomes(s,iri,ir2)  C  £2  to  be  the 
set  of  paths  that  can  be  followed  when  the  game  starts  from 
s  and  the  players  use  the  strategies  7r i  and  772.  Formally, 
so,  si,  s2,  •  •  •  G  Outcomes(s,  w\,  W2)  if  -s'o  =  s  and  if  for  all 
k  >  0  there  exist  moves  a\  G  ri(sj,)  and  a\  E  r2(«A,) 
such  that  7Ti(s0>...,Sii;)(af)  >  0, 7r2(s0,  •  •  • ,  sjfeX®*)  >  0, 
and  G  6(sk,a\,  o|). 

Once  the  starting  state  s  and  the  strategies  7Ti  and  7r2 
for  the  two  players  have  been  chosen,  the  game  is  reduced 
to  an  ordinary  stochastic  process.  Hence,  the  probabilities 
of  events  are  uniquely  defined,  where  an  event  A  C  £2 
is  a  measurable  set  of  paths.  For  an  event  A  C  £2,  we 
denote  by  Pr}  “;(7T)  the  probability  that  a  path  belongs 
to  A  when  the  game  starts  from  s  and  the  players  use  the 
strategies  tt\  and  tti.  In  particular,  given  a  subset  It  C  S 
of  states,  we  denote  the  event  of  reaching  R  by  (OR)  = 
{*,):  *1:  *2,  •  •  •  G  £2  |  3k  .  sk  G  R} .  Then,  Pr J'-’2(OR) 
is  the  probability  of  reaching  R  when  the  game  starts  at  s, 
and  playes  1  and  2  use  strategies  7Ti  and  7r2,  respectively. 

Similarly,  for  a  measurable  function  /  that  associates  a 
number  in  R  U  {  oo  }  with  each  path,  we  denote  by  EJ1  ’ Wl  {  /} 
the  expected  value  of  /  when  the  game  starts  from  s  and 
the  strategies  tti  and  7r2  are  used.  In  particular,  we  denote 
by  7  <•>  /,>  the  measurable  function  that  associates  with  each 
path  so,  si,  s2, . .  •  the  time  minjfc  >  0  |  sk  E  ff}  of  first 
passage  in  R.  Then,  E}  'T'2  flf/ii }  is  the  expected  time  to 
reach  R,  when  the  game  starts  at  s,  and  playes  1  and  2  use 
strategies  7Ti  and  i r2,  respectively. 


Types  of  strategies.  We  distinguish  the  following  types 
of  strategies: 

•  A  strategy  tt  is  deterministic  if  for  all  a  G  S+  there 
exists  a  E  Moves  such  that  7r(cr)(a)  =  1. 

•  A  strategy  tt  is  counting  if  tt^s)  =  7r(<72s)  for  all 
s  G  S  and  all  <tj  ,  cr2  G  S*  with  |or|  =  | cr2 1 ;  that  is, 
the  strategy  depends  only  on  the  current  state  and  the 
number  of  past  rounds  of  the  game. 

•  A  strategy  tt  is  finite-memory  if  the  distribution  cho¬ 
sen  at  every  state  s  E  S  depends  only  on  s  itself,  and 
on  a  finite  number  of  bits  of  information  about  the 
past  history  of  the  game. 

•  A  strategy  tt  is  memoryless  if  Tr(crs)  =  7 r(s)  for  all 
sGS  and  all  a  E  S* . 


2.2.  Classification  of  Winning  States 


A  winning  state  of  game  Q  is  a  state  from  which  player  1 
can  reach  the  target  set  R  with  probability  arbitrarily  close 
to  1.  We  distinguish  three  classes  of  winning  states: 

•  The  class  Sure(R)  of  sure-reachability  states  consists 
of  the  states  from  which  player  1  has  a  strategy  to 
force  the  game  to  R.  Precisely,  s  G  Sure(R)  iff 
there  is  7Ti  E  TTi  such  that  for  all  ; r2  G  TP  we  have 
Outcome s(s,  n\ .  H2)  C  (OR). 

•  The  class  Almost(R)  of  almost-sure-reachability 
states  consists  of  the  states  from  which  player  1  has 
a  strategy  to  reach  R  with  probability  1 .  Precisely, 
s  G  Almost(R)  iff  there  is  E  TTi  such  that  for  all 
7t2  G  n2  we  have  Pr =  1. 


The  class  Limit(R)  of  limit-sure-reachability  states 
consists  of  the  states  from  which  for  every  real  <  >  0, 
player  1  has  a  strategy  to  reach  R  with  probabil¬ 
ity  greater  than  1  —  e.  Precisely,  s  E  Limit(R)  iff 

SUP  en  inf  en  PC’*J(0 R)  =  1. 

1  men,  ir2en:  4  v 


Clearly,  Sure(R)  C  Almost(R)  C  Limit(R).  There  are 
games  for  which  both  inclusions  are  strict.  The  strictness  of 
the  inclusion  Sure(R)  C  Almost(R)  follows  from  the  well- 
known  fact  that  randomized  strategies  are  more  powerful 
than  deterministic  strategies  [vN28],  and  is  witnessed  by 
the  state  t±IOW  of  the  game  LEFT- OR- RIGHT.  The  strictness 
of  the  inclusion  Almost(R)  C  Limit(R)  is  witnessed  by  the 
state  *  hide  of  the  game  HIDE-OR-RUN  [KS81]. 


Winning  and  spoiling  strategies.  A  winning  strategy 
for  sure  reachability  is  a  strategy  tt  1  for  player  1  that 
acts  as  a  witness  to  all  states  in  Sure(R)\  that  is,  for 
all  states  s  E  Sure(R)  and  all  strategies  7 r2  of  player  2, 
Outcomes(s,  ir\,  irf)  Q  (OR).  Similarly,  a  winning  strat¬ 
egy  for  almost-sure  reachability  is  a  strategy  tt  1  for  player  1 
such  that  for  all  states  s  E  Almost(R)  and  all  strategies  7 r2 
of  player  2,  PrJ l  W2(OR)  =  1.  A  winning  strategy  fam¬ 
ily  for  limit-sure  reachability  is  a  family  {7Ti[e]  |  e  >  0} 
of  strategies  for  player  1  such  that  for  all  reals  t  >  0, 
all  states  s  E  Limit(R),  and  all  strategies  7 r2  of  player  2, 
Pr;iM.*i(o R)  >  1  -s. 

A  spoiling  strategy  for  sure  reachability  is  a  strat¬ 
egy  772  for  player  2  that  acts  as  a  witness  to  all  states 
s  f  Sure(R)  and  all  strategies  of  player  1;  that  is,  for 
all  states  s  f  Sure(R)  and  all  strategies  7T]  of  player  1, 
Outcomes(s,  itf)  %  (OR).  Similarly,  a  spoiling  strat¬ 
egy  for  almost-sure  reachability  is  a  strategy  7r2  for  player  2 
such  that  for  all  states  s  ^  Almost(R)  and  all  strategies  7Ti  of 
player  1,  Prj1,,r2(0/i)  <  1.  Finally,  a  spoiling  strategy  for 
limit-sure  reachability  is  a  strategy  7 r2  for  player  2  such  that 
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Sure 

^eachabili 

Almost 

ty 

Limit 

Complexity 

0(n) 

0(n2) 

0(«2) 

Winning  strategies 

DM 

M 

M 

Spoiling  strategies 

M 

C 

M 

Time 

Bnd 

Unb 

Unb 

Expected  time 

Bnd 

Bnd 

Unb 

Table  1 .  Overview  of  results  about  sure,  almost- 
sure,  and  limit-sure  reachability.  The  input  size 
of  the  game  is  indicated  by  n.  The  abbrevia¬ 
tions  DM,  M,  C  stand  for  deterministic  memory¬ 
less,  (randomized)  memoryless,  and  (random¬ 
ized)  counting,  respectively;  the  abbreviations 
Bnd  and  Unb  stand  for  bounded  and  unbounded. 


there  exists  areal  q  >  0  such  that  for  all  states  s  $  Limit(R) 
and  all  strategies  iri  of  player  1,  PrJ  1,7r2(OJi)  <  1  —  q. 

We  will  show  that  for  all  three  types  of  reachability, 
winning  and  spoiling  strategies  always  exist. 

2.3.  Overview  of  Our  Results 

In  Table  1  we  present  an  overview  of  the  main  results  on 
reachability  games  that  are  presented  in  this  paper.  The 
first  row  lists  the  complexity  of  the  algorithms  for  com¬ 
puting  the  sets  of  winning  states  with  respect  to  the  three 
types  of  reachability.  The  second  and  the  third  row  list 
the  types  of  winning  and  spoiling  strategies  available  to  the 
players.  For  each  type  of  reachability,  we  list  the  tightest 
class  of  strategies  that  always  contains  at  least  one  such 
winning  and  spoiling  strategy  (according  to  the  classifica¬ 
tion  of  Section  2.1).  The  last  two  rows  state  whether  the 
time  to  the  target,  and  the  expected  time  to  the  target,  are 
bounded  on  the  winning  states. 

For  a  state  s  £  S  and  an  integer  m  >  0,  we  say  that 
the  time  from  s  to  R  is  bounded  by  m  if  there  exists  a 
strategy  it\  for  player  1  such  that  for  all  strategies  tt2  of 
player  2,  sup{T<>jf(s)  |  s  6  Outcomes(s,  jtj,  tt2)}  <  m. 
If  the  time  from  s  to  R  is  not  bounded  by  any  integer  m, 
we  say  that  the  time  from  s  to  R  is  unbounded.  We  say  that 
the  expected  time  from  s  to  R  is  bounded  if  there  exists  an 
integer  m  >  0  and  a  strategy  iz\  for  player  1  such  that  for 
all  strategies  7r2  of  player  2,  we  have  YL*1'*2  {Tqr}  <  m. 
Note  that  for  every  state  s  $  Sure(  R),  the  time  from  s  to  R 


is  unbounded,  because  not  all  paths  reach  R,  and  for  every 
state  s  $  Almost(R),  the  expected  time  from  s  to  R  is 
unbounded,  because  R  is  reached  with  probability  always 
smaller  than  1 . 

3.  Computing  the  Winning  States 

In  this  section  we  present  algorithms  for  computing  the 
three  sets  Sure(R),  Almost(R),  and  Limit(R). 

3.1.  Sure-Reachability  States 

To  compute  the  set  Sure(R),  we  introduce  the  notion  of 
move  subassignments,  and  the  functions  Pre  and  Stay. 

A  move  subassignment  7  for  player  1  is  a  mapping  7  : 
S  * — ~  2^oves  that  associates  with  each  state  s  6  S  a  subset 
7(s)  C  Ti(s)  of  moves.  We  use  move  subassignments  to 
limit  the  set  of  moves  that  player  1  can  play  during  the 
game.  We  denote  by  A  the  set  of  all  move  subassignments 
for  player  1. 

The  function  Pre  1 :  2s  x  A  k  2s  is  defined  by 

Pre\{U,  7)  = 

{s  6  S  |  3ai  e  7(5)  .  Vci2  6  r2(s)  .  6(s,  a.  1,  af)  G  U}. 

Intuitively,  Pre  \  ( U.  7 )  is  the  set  of  states  from  which 
player  1  can  be  sure  of  being  in  U  after  one  round  us¬ 
ing  only  moves  from  7,  regardless  of  the  move  chosen  by 
player  2.  The  function  Pre-i :  2s  xA  -  2s  is  defined  in  a 
similar  way: 

Pre2((J :  7 )  = 

{s  £  S  |  3 a2  G  r2(s)  .  Vai  e  7(s)  .  6(s,  a.\,  a2 )  G  U}. 

Note  that  in  both  Pre\(U,  7)  and  Pre2(U.  7)  the  subassign¬ 
ment  7  refers  to  player  1.  The  function  Stay ,  :  2s  A  is 
defined  such  that  for  all  states  s  E  S, 

Stayl(U)(s )  = 

{ cji  e  ri(s)  |  Va2  e  r2(s) .  <§(s,  ai,a2)  e  u}. 

Intuitively,  Stay ,  ( U )  is  the  largest  move  subassignment  for 
player  1  that  guarantees  that  the  game  is  in  U  after  one 
round,  regardless  of  the  move  chosen  by  player  2. 

The  set  Sure(  R)  can  be  computed  using  the  following 
algorithm. 

Algorithm  1 

Input:  Reachability  game  Q  —  {G,R). 

Output:  Sure -reachability  set  Sure(R). 
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Initialization:  Let  Uo  =  R. 

Repeat  For  k  >  0,  let  Uk+i  —  lh  U  Prei(Ut,  Tt) 
Until  Uk+i-  Uk. 

Return:  Uk. 

The  algorithm  is  identical  to  the  one  used  for  turn-based 
games  [TW68],  and  it  can  be  implemented  to  run  in  time 
linear  in  the  size  of  the  game  [Bee80],  The  algorithm 
can  also  be  implemented  symbolically,  as  a  fixed-point 
computation  on  state  sets  [BCM+92].  The  theorem  below 
summarizes  some  basic  facts  about  the  set  Sure(R). 


This  0-1  law  only  applies  to  deterministic,  turn-based 
games.  As  an  example  of  a  (non-turn-based)  determin¬ 
istic  game  without  a  0-1  law,  consider  a  one -round  version 
of  the  game  LEFT-OR-RIGHT.  After  the  only  round,  the  game 
moves  from  the  state  t  throw  either  to  the  state  tut  or  to  the 
state  t  missed  •  Then, 


sup  inf 

Tnerii 


Pj.*lj*2 

^  throw 


{0{thit}) 


1 

2  ' 


3.2.  Almost-Sure-Reachability  States 


Theorem  1  For  a  reachability  game  with  target  set  R: 

1.  Algorithm  1  computes  the  set  Sure(R).  The  algo¬ 
rithm  can  be  implemented  to  run  in  time  linear  in  the 
size  of  the  game. 

2.  Player  1  has  a  memoryless  deterministic  winning 
strategy  for  sure  reachability. 

3.  Player  2  has  a  memoryless  spoiling  strategy  for  sure 
reachability.  This  spoiling  strategy  cannot  in  general 
be  deterministic. 

4.  For  every  state  s  £  Sure(R),  the  time  from  s  to  R  is 
bounded  by  the  size  of  the  state  space. 

If  R  =  Uq.  U\.  . . . .  Unl  =  Sure(R)  is  the  sequence  of 
sets  computed  by  Algorithm  1,  a  deterministic  memoryless 
winning  strategy  consists  in  playing  at  each  state  s  £  Uk+\ \ 
If  a  fixed  move  in  Stay(Uk){s),  where  0  <  k  <  m.  A 
simple  memoryless  spoiling  strategy  for  player  2  consists 
in  choosing  a  move  uniformly  at  random  from  the  available 
moves  at  each  state.  To  see  that  deterministic  spoiling 
strategies  may  not  exist  in  general,  it  suffices  to  consider 
the  state  ithrow  of  the  game  LEFT-OR-RIGHT. 

Reachability  in  turn-based  games.  If  a  reachability 
game  with  target  set  R  is  turn-based,  then  player  2  has  a  de¬ 
terministic  spoiling  strategy  tto  such  that  Prf]7’1(<SR)  =  0 
for  all  strategies  n\  £  ITi  for  player  1  and  all  states 
s  f  Sure(R).  Such  a  spoiling  strategy  simply  chooses 
at  each  s  Sure(R)  one  of  the  moves  b  £  r2(s)  such  that 
S(s:  a,  b)  f,  Sure(R)  for  all  a  6  rj(s)  [Tho95], 

This  observation  leads  to  the  fact  that  in  turn-based 
games  we  have  Sure(R)  =  Almost(R)  =  Limit(R),  i.e., 
the  three  notions  of  reachability  coincide.  Another  conse¬ 
quence  of  the  above  observation  is  that  deterministic  turn- 
based  reachability  games  have  “0-1  laws”;  that  is,  for  all 
states  s  £  S  of  a  turn-based  game, 

sup  inf  Pr^’^O-R)  £  {0, 1}. 

inerti  !r2en2 


The  algorithm  for  the  computation  of  the  set  Almost(R) 
uses  the  function  Safe.  For  i  £  {1,2},  the  function  Safei : 
2sxAi->2s  associates  with  each  state  set  U  C  S  and  each 
move  subassignment  7  C  A  the  largest  subset  V  C  U  such 
that  Pre,,  ( V.  7)  C  V.  The  set  PrejfV,  7)  can  be  computed 
as  the  limit  of  the  decreasing  sequence  Uo  =  V',  U\ .  Lh:  ■  ■  ■, 
where  we  take  Uk  +  i  =  V  FI  PrejfUk,  7)  for  k  >  0.  Hence, 
the  set  V  is  the  largest  subset  of  U  that  player  i  can  be 
sure  of  not  leaving  at  any  time  in  the  future,  regardless  of 
the  moves  chosen  by  the  other  player,  given  that  player  1 
chooses  moves  only  according  to  7.  Using  an  appropriate 
data  structure,  as  suggested  in  [Bee80],  the  computation  of 
Safe j(V,  7)  can  be  implemented  to  run  in  linear  time. 

The  set  Almost(R)  can  be  computed  using  the  following 
algorithm.  The  algorithm  has  running  time  quadratic  in  the 
size  of  the  game,  and  it  can  be  implemented  symbolically 
as  a  nested  fixed-point  computation. 

Algorithm  2 

Input:  Reachability  game  Q  —  { G,R ). 

Output:  Almost-sure-reachability  set  Almost(R). 

Initialization:  Let  Uo  ■  -  5,  70  - •  I) . 

Repeat  For  k  >  0,  let 

Ck  =  Safe2(Uk  \  R,jk), 

[4+i  =  SafefUk  \  Ck,  7*), 

7r-+i  =  Stayl(Uk+ 1) 

Until  [4+1  =  Uk. 

Return:  [4. 

The  algorithm  can  be  understood  as  follows.  The  set  Co  is 
the  largest  subset  of  S  \  R  to  which  player  2  can  confine  the 
game.  Player  1  must  avoid  entering  Co  at  all  costs:  if  Co 
is  entered  with  positive  probability,  R  will  not  be  reached 
with  probability  1.  The  set  U\  is  the  largest  set  of  states 
from  which  player  1  can  avoid  entering  Co-  The  move 
subassignment  71  then  associates  with  each  state  the  set  of 
moves  that  player  1  can  select  in  order  to  avoid  leaving  U\ . 
Since  71  C  Ti,  by  choosing  only  moves  from  71,  player  1 
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may  lose  some  of  the  ability  to  resist  confinement.  The 
set  C\  is  the  largest  subset  of  U\  \  R  to  which  player  2 
can  confine  the  game,  under  the  assumption  that  player  1 
uses  only  moves  from  71.  The  set  U2  is  then  the  largest 
subset  of  U\  from  which  player  1  can  avoid  entering  C\ , 
and  the  subassignment  72  C  71  guarantees  that  player  1 
never  leaves  1/2-  The  computation  of  C k,  Uk+i ,  and  yk+i, 
for  k  >  0,  continues  in  this  way,  until  we  reach  m  >  0 
such  that: 

•  if  player  1  chooses  moves  only  from  ~fm ,  the  game 
will  never  leave  Um ; 

•  player  2  cannot  confine  the  game  to  U„,  \  R,  even  if 
player  1  chooses  moves  only  from  7,-,, . 

At  this  point,  we  have  Vm+ 1  =  Um  =  Almost(R). 
Theorem  2  For  a  reachability  game  with  target  set  R: 

1.  Algorithm  2  computes  the  set  Almost(R).  The  algo¬ 
rithm  can  be  implemented  to  run  in  time  quadratic  in 
the  size  of  the  game. 

2.  Player  1  has  a  memoryless  winning  strategy  for 
almost-sure  reachability.  This  winning  strategy  can¬ 
not  in  general  be  deterministic. 

3.  Player  2  has  a  counting  spoiling  strategy  for  almost- 
sure  reachability.  This  spoiling  strategy  cannot  in 
general  be  deterministic,  nor  finite-memory. 

4.  For  every  state  s  6  Almost(R),  the  expected  time 
from  s  to  R  is  bounded. 

If  S  =  //o;  V]. ....  Um  =  Almost(R)  and  71, . . . ,  jm 
are  the  sequences  of  sets  and  move  subassignments  com¬ 
puted  by  the  algorithm,  a  memoryless  winning  strategy 
for  player  1  consists  in  playing  at  each  state  s  6  Um  a 
move  chosen  uniformly  at  random  from  7 m(s).  Result  4 
then  follows  from  results  about  the  stochastic  shortest-path 
problem  [BT91]. 

The  game  HIDE-OR-RUN  is  an  example  of  a  game  that 
does  not  have  a  finite-memory  spoiling  strategy.  In  fact,  it 
can  be  seen  that  for  each  finite-memory  strategy  of  player  2, 
player  1  has  a  strategy  to  get  from  s^de  to  Sh0me  with  prob¬ 
ability  1.  To  construct  an  infinite-memory  spoiling  strat¬ 
egy,  we  proceed  as  follows.  Consider  the  two  memoryless 
strategies  ir\  and  717  for  player  2  defined  by 

f  n\{shide){throw)  =  0  f  irl(shide)(throw)  -  5 

1  K\(shide){wait)  =  1  \  irl(shide)(wait)  =  \  . 

The  strategy  tt\  is  effective  against  the  strategies  of  player  1 
that  wait  till  player  2  throws  the  snowball  before  running. 
On  the  other  hand,  the  strategy  is  effective  against  the 


strategies  of  player  1  that  may  run  before  having  seen 
player  2  throw  the  snowball.  To  obtain  a  spoiling  strat¬ 
egy,  which  must  work  in  all  cases,  we  “mix”  the  strategies 

and  7rj,  as  if  player  2  could  secretly  flip  a  coin  before  the 
game  starts  to  decide  which  of  the  two  strategies  to  use.  The 
idea  of  flipping  a  coin  before  the  game  starts  to  determine 
which  strategy  to  use  is  known  as  initial  randomization, 
and  it  constrasts  with  on-the-fly  randomization,  which  is 
the  process  of  flipping  coins  during  the  game  to  choose 
the  move  to  be  played.  Our  definition  of  strategy  allows 
only  on-the-fly  randomization.  Nevertheless,  from  [Der70] 
we  know  that  initial  randomization  between  finitely  many 
strategies  ... ,  irm  can  be  simulated  by  a  single  strat¬ 

egy  7t  that  uses  on-the-fly  randomization  only.  However, 
there  is  a  price  to  pay:  even  when  strategies  ir\ir2, ... ,  irm 
are  memoryless,  strategy  tt  may  need  infinite  memory.  In 
our  case,  by  mixing  the  strategies  tt\  and  with  equal 
probability,  we  obtain  the  strategy  772,  defined  for  all  k  >  0 
by 

7T2 {shidek)(wait)  =  2(_1/2'')  , 

where  Hhidck'  is  the  path  consisting  of  k  states  Shide-  It  is 
easy  to  check  that  if  player  2  plays  according  to  to,  then  she 
eventually  throws  a  snowball  with  (cumulative)  probability 
1/2,  consistent  with  the  fact  that  70  is  the  “equal  probability 
mix”  of  tt\  and  7 t\.  Note  that  70  is  an  infinite-memory, 
counting  strategy. 

3.3.  Limit-Sure-Reachability  States 

Similarly  to  the  algorithm  for  almost-sure  reachability,  the 
algorithm  for  limit-sure  reachability  computes  a  decreasing 
sequence  Uq  =  S ,  U\,  U2 ,  . .  of  candidate  winning  states; 
the  set  Limit(R)  is  the  limit  of  this  sequence.  At  each 
iteration  k  >  0,  the  algorithm  determines  a  set  Ck  C  67  \  R 
of  states  from  which  player  1  cannot  force  a  visit  to  R 
with  probability  arbitrarily  close  to  1,  and  assigns  [4+1  = 

14  \  ck. 

The  set  Ck  is  also  determined  in  an  iterative  fashion. 
Initially,  we  set  Ck  =  Uk  \  R ;  then,  we  remove  states 
from  this  set,  computing  a  decreasing  sequence  f.'J!,  C'l , 
C2, . . .  that  converges  to  Ck.  To  understand  how  this  latter 
sequence  is  computed,  consider  the  stage  of  these  iterations 
when  sets  Uk  and  CJk  have  been  computed,  and  consider 
a  state  s  6  CJk.  From  the  point  of  view  of  player  1,  the 
situation  from  s  is  as  follows.  By  construction,  the  states  in 
S  \  if  are  not  winning  states,  so  that  player  1  must  avoid 
leaving  Uk.  Moreover,  as  Ck  fl  R  =  0,  player  1  must  also 
avoid  being  trapped  in  Ck.  Hence,  player  1  must  try  to 
escape  from  Ck,  and  at  the  same  time  avoid  leaving  [4  ■ 

Denote  by  6  and  ^2  6  21(14(4))  the  distri¬ 

butions  used  by  players  1  and  2  at  s ,  respectively.  Given 
a  subset  V  C  S  of  states,  indicate  by  p(s,  £1,  ^)(I/)  the 


probability  of  going  from  s  to  l  '  in  one  round  under  distri¬ 
butions  £1  and  £>■  Consider  the  ratio 


p{s,£u&){S\CJk) 


between  the  probabilities  of  escaping  from  C3k  and  of  leav¬ 
ing  Uk-  If  player  1  can  choose  £1  to  make  the  ratio  (1) 
arbitrarily  large,  then  any  attempt  of  player  2  to  confine 
the  game  to  C\  can  involve  s  only  in  a  transitory  fashion: 
in  fact,  infinitely  many  visits  to  s  would  lead  to  escaping 
from  (.'[  with  arbitrarily  high  probability,  while  losing  the 
game  by  leaving  Uk  with  negligible  probability.  On  the 
other  hand,  if  the  ratio  (1)  cannot  be  made  arbitrarily  large, 
then  player  2  can  choose  £2  so  that,  at  each  visit  to  s,  the 
probability  of  leaving  CJk  is  compensated  by  a  proportional 
probability  of  leaving  Uk .  In  this  case,  player  1  cannot  use 
state  s  to  escape  from  Ck . 

These  considerations  motivate  our  definition  of  limit- 
escape  states.  Given  the  sets  U  C  S  and  C  C  U,  and  a 
state  s  6  C,  we  say  that  s  is  limit  escape  with  respect  to  C 
and  U  if 


+  p(s,6,6)(5'\C) 

sup  inf  — — - — „  , .  „  . 

fiex’friM)  f2eX,(r2(*))  p{s,£u&){S  \  U) 


inf 


(2) 


A  state  s  is  then  removed  from  C3k  to  form  C3k  +  l  iff  it  is 
limit  escape  with  respect  to  Ck  and  Uk. 

Let  us  illustrate  the  algorithm  for  limit-sure  reachability 
on  the  game  HIDE-OR-RUN.  The  algorithm  first  computes 
Co  —  { s uc( }  and  U\  —  { s^ide,  Ssafe,  shome\ ■  The  state  ssafe 
is  easily  eliminated  from  C j*  =  {.s/,,dc,  ssafe},  leading  to 
(/,'  =  {s/ude}-  At  s hide,  player  1  can  play  either  hide  or 
run.  To  escape  from  Cj  and  reach  sj,ome  with  arbitrarily 
high  probability,  player  1  must  be  “patient'’  and  choose 
move  run  with  sufficiently  low  probability  at  each  round. 
Precisely,  for  every  0  <  e  <  1,  define  the  distribution 
fc  P(r,(.s))by: 


£i [e\(run)  —  e,  £i[e](hide)  =  1  —  e  .  (3) 


By  using  distribution  [s]  and  letting  £  —  0,  player  1  can 
make  the  ratio  (1)  diverge  (for  k  =  j  =  1);  in  fact, 


lim 


p(s,£i[e],6)(.S'\C,i1) 

o  6e-D('r2(s))  p(s,^[s],^2){S\U1) 


inf 


=  lim  inf 

£  — ,o  o<q< i  eq 


lim - 1  =  oo  . 

e^O  £ 


state  .s/,, do-  Once  7Ti  [e]  is  fixed,  results  on  Markov  decision 
processes  ensure  that  the  optimal  strategy  for  player  2  to 
avoid  reaching  R  is  memoryless  (and  also  deterministic) 
[Der70],  Hence,  simple  calculations  show  that 

inf  Prs‘^]’,r®(0{ Shome } )  =  1-6. 
ir2en2  hlde 

The  fact  that  Shide  6  Limit(R)  follows  by  taking  the  limit 
e  —  0  in  this  equation.  This  confirms  that  Shide  6 
Limit(R)  \  Almost(R),  as  we  mentioned  in  the  introduc¬ 
tion. 

There  is  a  relation  between  the  computation  of  the 
sets  Ck  in  the  algorithms  for  almost-sure  and  limit-sure 
reachability.  In  Algorithm  2,  the  set  Ck  is  computed  by 
Ck  =  Safe2{Uk  \  R,  Jk).  If  we  expand  the  computation  of 
Ck,  we  see  that  C*  is  again  computed  as  the  fixpoint  of  a 
decreasing  sequence  Ck,  C\,  Cj ,  . . .  For  j  >  0,  a  state  s 
is  removed  from  Ck  if  there  is  £i  such  that  for  all  ^2,  the 
numerator  of  (1)  is  nonzero,  and  the  denominator  is  0.  In 
this  case,  player  1  from  s  can  use  £1  to  escape  Ck  with  pos¬ 
itive  probability,  while  not  risking  a  retreat  from  Uk  ■  Such 
an  s  is  called  a  safe-escape  state.  For  almost-sure  reach¬ 
ability,  player  1  must  use  safe  escape,  because  in  order  to 
reach  the  target  with  probability  1  he  cannot  risk  to  lose. 
For  limit-sure  reachability,  player  1  can  instead  use  limit 
escape:  as  long  as  the  ratio  between  risk  (of  retreat)  and 
escape  (towards  the  target)  can  be  made  arbitrarily  large, 
the  player  can  reach  the  target  with  probability  arbitrarily 
close  to  1. 

3.3.1  Computing  Limit-Escape  States 

The  following  algorithm  determines  whether  a  state  is  limit 
escape. 

Algorithm  3 

Input:  Game  structure  G,  two  sets  C  C  U  C  S  of  states, 
and  a  state  s  6  C. 

Output:  Yes  if  s  is  limit  escape  with  respect  to  C  and  U, 
No  otherwise. 

Initialization:  Letf>_i  =  0. 

Repeat  For  k  >  0,  let 

Ak  =  {a  E  Ti(s)  I  V6  E  r2(s) . 

if  6(s,  a,  b )  ^  U  then  b  6  Bk-i}, 

Bk  =  {(>  £  r2(s)  I  3a  G  Ak  ■  <5(s,  a,  b)  C } 


The  divergence  of  the  ratio  between  the  one -round  prob¬ 
ability  of  escape  and  the  one -round  probability  of  capture 
enables  player  1  to  eventually  escape  with  probability  ar¬ 
bitrarily  close  to  1.  To  verify  this,  let  Trje]  be  the  mem¬ 
oryless  strategy  for  player  1  that  uses  distribution  [e]  at 


Until  A/.  +  i  =  Ak  and  Bk+\  =  Bk. 

Return:  Yes  if  Bk  =  r2(.s).  No  otherwise. 

We  say  that  a  move  a  6  Ti  (s)  is  labeled  if  a  6  Ak  for  some 
k  >  0;  if  a  is  labeled  we  define  £(a)  =  min{i  |  a  6  Ai}. 


9 


Similarly,  we  say  that  a  move  b  G  1^2  (s)  is  labeled  if  b  G  £4 
for  some  k  >  0.  The  algorithm  declares  the  state  s  limit 
escape  with  respect  to  C  and  U  iff  all  moves  rTf.s)  for 
player  2  at  s  are  labeled.  When  Algorithm  3  is  given  as 
input  state  Shide  of  the  game  HIDE-OR-RUN  and  C  —  {  s/,ide} , 
u  =  {s/,ide,  Ssafe,  Shome },  it  labels  the  moves  of  player  1  at 
Shide  with 

((hide)  =  0,  ((run)  =  1  .  (4) 

If  a  state  s  is  declared  limit  escape,  then  also  all  moves 
in  ri(s)  are  labeled,  and  their  labels  provide  us  with  an 
e-indexed  family  ( \  [s]  of  distributions  that  make  the  ratio 
(2)  diverge.  Precisely,  for  0  <  e  <  1  / (2 |F i (s)  | ),  the  distri¬ 
bution  ^i[e]  plays  move  a  G  ri(s)  with  probability  e£(a  )  if 
1(a)  >  0,  and  it  plays  all  moves  in  {a  G  ri(s)  |  1(a)  =  0} 
uniformly  at  random  with  the  remaining  probability.  From 
(4),  we  see  that  the  distribution  constructed  in  this  fashion 
for  the  state  Shide  of  the  game  HIDE-OR-RUN  coincides  with 
the  one  given  in  (3). 

3.3.2  Computing  Limit-Sure  Reachability  States 

Given  the  target  set  R  and  a  subset  U  C  S  with  R  C 
U ,  the  following  algorithm  computes  the  largest  subset 
Cage(U)  =  C  C  U  \  R  that  does  not  contain  any  limit- 
escape  state  with  respect  to  C  and  U.  The  set  C  is  computed 
as  the  limit  of  the  previously  described  decreasing  sequence 
C°,  C\  C2,... 

Algorithm  4 

Input:  Reachability  game  Q  =  { G ,  R).  and  U  C  S  with 
R  C  U. 

Output:  Cage(U )  C  S. 

Initialization:  Let  C°  —  U  \  R. 

Repeat  For  j  >  0,  let  C'3  + 1  = 

{s  G  C3  |  s  is  not  limit  escape  w.r.t.  C1  and  U } 
Until  C3  + 1  ..  C3. 

Return:  C3 . 

The  set  Limit(R)  can  be  computed  using  the  following  algo¬ 
rithm,  which  uses  the  computation  of  Cage  as  a  subroutine. 

Algorithm  5 

Input:  Reachability  game  Q  —  ( G,R ). 

Output:  Limit-sure-reachability  set  Limit(R). 

Initialization:  Let  Uq  =  S. 

Repeat  For  k  >  0,  let  [4+1  =  lh  \  Cage(Uk) 

Until  Uk+i-  Uk. 

Return:  Uk- 

The  following  theorem  summarizes  the  results  on  limit-sure 
reachability. 


Theorem  3  For  a  reachability  game  with  target  set  R: 

1.  Algorithm  5  computes  the  set  Limit(R).  The  algo¬ 
rithm  can  be  implemented  to  run  in  time  quadratic  in 
the  size  of  the  game. 

2.  Player  1  has  a  family  of  memoryless  winning  strate¬ 
gies  for  limit-sure  reachability.  These  winning  strate¬ 
gies  cannot  in  general  be  deterministic. 

3.  Player  2  has  a  memoryless  spoiling  strategy  for  limit- 
sure  reachability.  This  spoiling  strategy  cannot  hi 
general  be  deterministic. 

To  obtain  a  version  of  the  algorithm  that  runs  in  quadratic 
time  it  is  necessary  to  optimize  the  implementation  of  Al¬ 
gorithm  4;  the  optimized  version  is  given  in  [dAFlK98]. 
Results  2  and  3  are  from  [KS81];  the  construction  of  the 
winning  and  spoiling  strategies  is  explained  in  [dAHK98], 

To  see  that  deterministic  memoryless  winning  strategies 
may  not  exist  in  general,  it  suffices  to  consider  the  state 
t  throw  of  the  game  LEFT- OR- RIGHT.  To  see  that  deterministic 
memoryless  spoiling  strategies  may  not  exist  in  general, 
it  suffices  to  consider  again  the  one -round  version  of  the 
game  LEFT-OR-RIGHT,  in  which  after  the  only  round  the 
game  moves  from  the  state  tthr0w  either  to  the  state  thu  or 
to  the  state  fmissed-  Then,  it  is  immediate  to  check  that 
Limit({thh })  =  {4,,};  moreover,  by  considering  the  state 
l  throw  we  see  that  there  are  no  deterministic  spoiling  strate¬ 
gies. 

4.  Randomized  ATL 

For  the  specification  and  verification  of  open  systems, 
[  AHK97]  introduced  the  temporal  logic  Alternating  Tem¬ 
poral  Logic  (ATL).  The  logic  ATL  is  interpreted  over  multi¬ 
player  game  structures,  and  includes  formulas  of  the  form 
((A))  9 ,  which  asserts  that  a  team  A  of  players  (called  agents ) 
has  a  strategy  to  ensure  that  all  outcomes  of  the  game  sat¬ 
isfy  the  path  property  9.  The  semantics  of  the  logic  ATL  is 
defined  with  respect  to  deterministic  strategies  only.  Con¬ 
sequently,  in  a  two-player  game  structure,  if  pR  is  a  formula 
defining  the  target  set  R,  then  the  formula  {(Playerl))OpR 
is  true  exactly  in  the  sure-reachability  states. 

In  this  section,  we  generalize  the  logic  to  Randomized 
ATL  (RATL).  The  logic  RATL  is  defined  with  respect  to  ran¬ 
domized  strategies,  and  distinguishes  between  three  kinds 
of  satisfaction  for  path  properties:  sure  satisfaction,  almost- 
sure  satisfaction,  and  limit-sure  satisfaction;  correspond¬ 
ingly,  the  single  path  quantifier  ((  ))  of  ATL  is  replaced  by 
the  three  quantifiers  {{  ))SIlre,  ((  ))aimost,  and  ({  For 

example,  the  formula  ((Playerl)) aimostOipj{  will  be  true  ex¬ 
actly  in  the  almost-sure -reachability  states. 
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Formally,  a  system  S  =  (n,  S,  Moves,  T,  6,  Q,  L)  con¬ 
sists  of  a  number  n  >  0  of  agents,  a  finite  state  space 
S ,  a  finite  set  Moves  of  moves,  a  move  assignment 
T:  S  x  {1, . . . ,  n}  2^oves  \  0,  a  transition  function 
6 :  S  x  Moves'1,  i-»  S,  a  finite  set  <5  of  propositions,  and  a 
function  L :  S  i— >  2*^  that  labels  each  state  with  the  proposi¬ 
tions  that  are  true  in  the  state.  Thus,  a  system  with  n  agents 
is  a  labeled  n -player  game  structure:  at  every  state  s  G  .S', 
each  agent  i  G  { 1, . . . ,  n}  chooses  a  move  cq  G  T(s,  j),  and 
the  game  proceeds  to  the  state  6(s,  ai, , , . ,  an  ).  Typically, 
the  agents  model  individual  processes,  or  components,  of 
a  reactive  program.  The  paths  of  5  are  defined  in  anal¬ 
ogy  to  two-player  game  structures.  A  strategy  tta  for  a 
(possibly  empty)  set  A  =  { i\ , . . . ,  ** }  C  { 1, . . . ,  n}  of 
agents  is  a  mapping  tta  :  S+  i— >  V(Movesk)  such  that 
7G4(crs)(ai, . . . ,  a*)  >  0  implies  aj  G  r(s, ij)  for  all 
1  <  j  <  k.  Given  a  set  A  of  agents,  we  denote  by  ITi  the 
set  of  strategies  for  A. 

The  temporal  logic  RATL  is  defined  with  respect  to  a 
set  Q  of  propositions  and  a  set  E  =  {1, . . . ,  n}  of  agents. 
A  Randomized  ATL  formula  is  one  of  the  following: 

•  q ,  for  propositions  q  G  Q. 

•  ~^p  or  p  V  ip,  where  p  and  ip  are  RATL  formulas. 

•  {(A))WinO<p  or  {{ A))wiuUp  or  ((A)) wi„  p U ip ,  where 
A  C  {l,...,n}  is  a  set  of  agents,  win  G 
{sure,  almost,  limit}  is  a  type  of  winning  condition, 
and  p  and  ip  are  RATL  formulas. 

The  operators  (( ))  win  are  path  quantifiers ,  and  O  (“next”),  □ 
(“always”),  and  U  (“until”)  are  the  usual  temporal  opera¬ 
tors  [MP9 1  ] .  We  interpret  RATL  formulas  over  the  states  of 
a  system  S  that  has  the  same  sets  of  agents  and  propositions 
used  to  define  the  formulas.  The  subformulas  of  RATL  of 
the  form  Op,  Up,  or  pUip  are  called  path  subformulas, 
and  they  are  interpreted  over  the  paths  of  S.  For  a  path 
subformula  9,  we  denote  by  [$]  the  event  consisting  of  all 
paths  that  satisfy  9,  as  defined  by  the  standard  semantics  of 
the  temporal  operators.  Subformulas  of  RATL  of  the  form 
p,  -<p,  p  V  ip,  or  ((A))win  0  are  called  state  subformulas, 
and  they  are  interpreted  over  the  states  of  S.  For  a  state 
subformula  p,  we  write  s  |=  p  to  indicate  that  the  state  s 
satisfies  p.  We  present  here  only  the  semantics  for  state 
subformulas  of  the  form  ((A))  w,n  the  propositional  and 
boolean  cases  are  standard.  For  a  path  subformula  6,  we 
define: 

•  s  j=  ((A))sure  9  iff  there  exists  tta  G 
such  that  for  all  tty\  /t  G  rL\  .i  we  have 
Outcome s(x,  tta  ■.  7Ti\.-i )  L  [0]. 

•  s  \=  ((A))aimosr  9  iff  there  exists  tta  G  TLi  such  that 
for  all  irz\A  £  we  have  PrJ'4’7^  ([<?])  =  1. 


•  S  \=  ((A)) limit  9  iff 

sup  inf  Pr^Hdt?])  =  1. 

In  particular,  the  logic  ATL  is  the  fragment  of  RATL  where 
the  only  path  quantifier  is  ((A))sure. 

If  s  |=  ((A))wi„9,  for  win  G  {sure ,  almost,  limit},  then 
the  winning  strategies  provide  a  controller  C  for  the  set 
of  agents  A.  When  the  controller  C  is  composed  with 
the  set  A  of  agents,  the  resulting  system  is  guaranteed  to 
satisfy  9  with  win  confidence.  If  win  =  sure,  then  the 
controller  can  always  be  chosen  to  be  deterministic  and 
memoryless.  If  win  G  {almost,  limit},  then  the  controller 
can  still  be  chosen  to  be  memoryless,  but  it  may  need  to  be 
randomized. 

From  the  classification  of  winning  states  in  Section  2,  it 
follows  that  s  |=  ((A)) sure  0  implies  s  |=  ((A)) almos,  9 ,  which 
in  turn  implies  s  \=  (( A))umit  6\  the  reverse  implications  do 
not  necessarily  hold.  Interestingly,  the  implications  can  be 
strict  only  for  path  subformulas  9  of  the  form  p  U ip,  which 
specify  liveness-like  properties  (such  as  reachability).  By 
contrast,  for  path  subformulas  9  of  the  form  Op  and  Up, 
which  specify  safety-like  properties,  the  three  winning  con¬ 
ditions  are  equivalent. 

Theorem  4  Consider  a  path  subformula  9  of  the  form 
Op  or  Up.  Then,  for  every  state  s  of  a  system  S,  we  have 
S  b  ((A))  sure  o  iff  s  |=  ((A)) almos,  9  iff  s  |=  ((A))  limit  9. 

The  model-checking  problem  for  RATL  asks,  given  a 
system  S  and  an  RATL  formula  p,  for  the  set  of  states  of 
5  that  satisfy  p.  A  model-checking  algorithm  for  RATL 
can  proceed  bottom-up  on  the  state  subformulas  of  p,  as 
in  CTL  and  ATL  model  checking  [CE81,  QS81,  AHK97], 
The  nontrivial  cases  are  ((A))sure  plhp,  ((A)) a,most  p  U  ip , 
and  ((A)) umit  p Hip.  The  subformula  ((A)) sure  p  U ip  can  be 
checked  as  in  ATL.  In  order  to  check  the  other  two  sub¬ 
formulas,  we  first  construct  a  two-player  game  structure, 
in  which  player  1  corresponds  to  the  set  A  of  agents,  and 
player  2  corresponds  to  the  set  E  \  A.  We  define  the  target 
set  to  be  R  =  {s  G  S  \  s  |=  ip}.  If  R  is  not  absorbing, 
we  locally  modify  the  game  structure  to  make  it  absorbing. 
To  check  the  subformula  ((A))„imos,  pUip,  we  modify  Al¬ 
gorithm  2,  so  that  Co  =  {sG5'|s|b‘F^  V1}-  To  check 
the  subformula  {(A)) //„„■,  pUip,  we  modify  Algorithm  5,  so 
that  tL0  =  {se5'|s|=<ypVV;}-  Intuitively,  while  in  the 
OR  reachability  game  player  1  only  has  to  avoid  states  in 
which  player  2  can  keep  him  away  from  the  target  set  R,  in 
the  pU  ip  game  player  1  also  has  to  avoid  states  that  satisfy 
neither  p  nor  ip. 

Theorem  5  The  model-checking  problem  for  RATL  spec¬ 
ifications  can  be  solved  in  time  quadratic  in  the  size  of  the 
system  and  linear  in  the  size  of  the  formula. 
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